Project Overview
Agentic Browser is a next-generation browser extension designed to act as an intelligent agent that understands and controls web content. Its mission is to bridge modern LLM reasoning with real browser interactivity, enabling users to issue natural-language commands that are translated into safe, human-approved actions on live web pages. The project emphasizes model-agnostic intelligence, privacy-respecting design, and open-source extensibility, positioning itself as an adaptive, secure platform for AI-driven web automation.
The repository is organized into cohesive layers:
Core runtime and configuration
Agent orchestration and tooling
MCP server for model-agnostic communication
Browser extension (background, content scripts, UI)
Services and tools for specialized workflows
Prompts and utilities for grounded reasoning
Diagram sources
Section sources
Model-agnostic LLM integration: A unified adapter supporting multiple providers and local models.
MCP-compliant server: Exposes tools and LLM generation via the Model Context Protocol.
Agent orchestration: LangGraph-based ReAct agent with a rich toolset for web, calendar, email, and browser actions.
Browser extension: Secure background and content scripts with declarative action execution.
Prompt engineering: Specialized prompts for generating safe, structured action plans for browser automation.
Section sources
Agentic Browser follows a layered architecture:
Frontend: Extension UI and messaging channels
Backend: FastAPI server and MCP server
Agent runtime: LangGraph workflows and tooling
LLM adapters: Provider-agnostic clients
Safety: Guardrails and declarative action system
Diagram sources
Mission and Objectives#
Model-agnostic intelligence: Seamless switching across providers and local models.
Secure browser extension: WebExtensions-based design with explicit user approvals.
Advanced agent workflows: RAG, persistent memory, and multi-step automation.
Guardrails and transparency: User consent, logging, filtering, and allowlists.
Open-source extensibility: Modular architecture for community contributions.
Section sources
Key Architectural Principles#
Model-agnostic design: Unified LLM adapter supports OpenAI, Anthropic, Ollama, and others.
BYOKeys approach: Users supply their own API keys via environment or UI.
MCP compliance: Structured tool definitions and standardized LLM invocation.
Declarative action system: Natural language goals mapped to JSON action plans.
Section sources
Technical Stack Overview#
Agent orchestration: LangChain, LangGraph for stateful workflows.
Browser control: WebExtensions API for tab/window control and DOM injection.
LLM adapters: OpenRouter, Ollama, Anthropic, OpenAI, Hugging Face integrations.
Backend agent: Python MCP server exposing tools and LLM generation.
Retrieval and citation: Vector databases and RAG pipelines.
Safety and guardrails: Logging, filtering, and explicit user consent.
Section sources
Core Objectives in Practice#
Model-agnostic agent backend: Python, LangChain, MCP framework with provider adapters.
Secure browser extension: Chrome/Firefox compatible via WebExtensions.
Advanced agent workflows: RAG, persistent memory, multi-step tasks.
Guardrails and transparency: User approval, logs, filtering, allowlists.
Open-source extensibility: Modular tool and service architecture.
Section sources
Model-Agnostic LLM Adapter#
The LLM adapter encapsulates provider-specific clients behind a unified interface. It supports multiple providers, validates API keys, and constructs client instances with configurable base URLs and models.
Diagram sources
Section sources
MCP Server and Tooling#
The MCP server exposes standardized tools for LLM generation, GitHub QA, website fetching, and HTML-to-markdown conversion. It dynamically initializes LLM clients based on incoming arguments and returns structured text content.
Diagram sources
Section sources
Agent Orchestration and Tools#
The ReAct agent uses LangGraph to alternate between reasoning and tool execution. It binds tools dynamically, manages conversation context, and converts between internal and external message formats.
Diagram sources
Section sources
Browser Action Agent and Declarative Actions#
The browser action agent translates natural language goals into JSON action plans. The prompt defines available actions (DOM manipulation and tab/window control) and enforces strict output formatting and selector selection rules.
Diagram sources
Section sources
Extension Messaging and Security#
The extension uses explicit message types for agent tool execution, tab/window control, and action execution. Every action is logged and requires user consent, ensuring transparency and safety.
Diagram sources
Section sources
The project relies on a curated set of libraries for LLM integration, web automation, and agent orchestration. The dependency graph highlights the central role of LangChain/LangGraph and MCP.
Diagram sources
Section sources
Asynchronous tool execution: Tools leverage asyncio and thread pools to avoid blocking the main loop.
Provider configuration caching: LLM clients are constructed on-demand with validated credentials.
Minimal DOM manipulation: Content scripts inject only necessary scripts and dispatch minimal events.
Efficient prompting: Prompt templates are concise and output-only JSON to reduce parsing overhead.
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Missing environment variables: Ensure API keys and base URLs are configured in the environment.
LLM initialization failures: Verify provider availability, base URLs, and model names.
Extension action errors: Confirm tab permissions and that the content script is injected before execution.
MCP tool errors: Validate tool names and argument schemas; check server logs for exceptions.
Section sources
Agentic Browser delivers a model-agnostic, secure, and extensible platform for intelligent web automation. By combining MCP-compliant tooling, a declarative action system, and robust guardrails, it enables users to safely automate complex web tasks while maintaining control and transparency. The modular architecture invites community contributions and positions the project as a foundation for adaptive AI browser automation.